Frequent itemset mining is a popular data mining technique. Apriori, Eclat,and FP-Growth are among the most common algorithms for frequent itemset mining.Considerable research has been performed to compare the relative performancebetween these three algorithms, by evaluating the scalability of each algorithmas the dataset size increases. While scalability as data size increases isimportant, previous papers have not examined the performance impact ofsimilarly sized datasets that contain different itemset characteristics. Thispaper explores the effects that two dataset characteristics can have on theperformance of these three frequent itemset algorithms. To perform thisempirical analysis, a dataset generator is created to measure the effects offrequent item density and the maximum transaction size on performance. Thegenerated datasets contain the same number of rows. This provides some insightinto dataset characteristics that are conducive to each algorithm. The resultsof this paper's research demonstrate Eclat and FP-Growth both handle increasesin maximum transaction size and frequent itemset density considerably betterthan the Apriori algorithm. This paper explores the effects that two dataset characteristics can have onthe performance of these three frequent itemset algorithms. To perform thisempirical analysis, a dataset generator is created to measure the effects offrequent item density and the maximum transaction size on performance. Thegenerated datasets contain the same number of rows. This provides some insightinto dataset characteristics that are conducive to each algorithm. The resultsof this paper's research demonstrate Eclat and FP-Growth both handle increasesin maximum transaction size and frequent itemset density considerably betterthan the Apriori algorithm.
展开▼